Here we present a summary of processing steps on WASH dataset.
Three WASH variables were created as per WHO definition (Damazo). See codebook for variable labels.
cat_watersource
cat_toilettype
cat_garbagedisposal
For every variable, cases with NIU or Missing: Impute were recoded to NA.
Cases which had NA in Gender and Age were completely dropped.
Combined smaller groups to others.
A composite variable was created from the three was variables using logistic PCA.
Centered the Household total expenditure.
For every case, we summed the number of WASH indicators the had access to (max = 3) and calculated the proportion (No sure how to call this rate) Is it possible to model the total as poisson process?
Visualization plots for individual WASH were created but initial modelling is on composite WASH variable.
We also present the result from Generalized Linear Mixed-effect Model using lme4 package (glmer).
Use scoring approaches e.g., PCA to create composite WASH variable and then apply GLMM.
Apply multivariate mixed models; either using pseudo multivariate approach in (glmer) or use other approaches proposed by Samuel.
Assume equal weights for each of the WASH indicator variables and model as a count data. We could use Poisson or Negative Binomial.
Model them separately.
Any other suggestions?
The table below summarizes the proportion of missingness for all the variables.
We begin by showing the distribution of individual WASH variables (indicators) over time and space (slum area). Thereafter, we show the distribution of demographic, social and economic variables, of interest, based on composite WASH variable.
Picked one of the predictor variables wealthindex
Initialised values of \(\beta\) and intercept for the three services.
predicted <- intercept + beta * wealthindexSampled fake y from a binomial distribution
y <- rbinom(n, 1, plogis(predicted))model <- glm(service1 ~ wealthindex, data = data, family = "binomial")Included hhid and then:
model <- gmler(service1 ~ wealthindex + (1|hhid_anon), data = data, family = binomial)The three simulated wash variables were transformed to long format; Services is label while status takes \(0\) or \(1\).
model <- gmler(status ~ wealthindex*services + 0 + (services + 0|hhid_anon), data = data, family = binomial)